Improved QMDP Policy for Partially Observable Markov Decision Processes in Large Domains: Embedding Exploration Dynamics

نویسندگان

  • Giorgos Apostolikas
  • Spyros G. Tzafestas
چکیده

Artificial Intelligence techniques were primarily focused on domains in which at each time the state of the world is known to the system. Such domains can be modeled as a Markov Decision Process (MDP). Action and planning policies for MDPs have been studied extensively and several efficient methods exist. However, in real world problems pieces of information useful for the process of action selection are often missing. The theory of Partially Observable Markov Decision Processes (POMDP’s) covers the problem domain in which the full state of the environment is not directly perceivable by the agent. Current algorithms for the exact solution of POMDP’s are only applicable to domains with a small number of states. To cope with more extended state spaces, a number of methods that achieve sub-optimal solutions exist and among these the QMDP approach seems to be the best. We introduce a novel technique, called Explorative QMDP (EQMDP) which constitutes an important enhancement of the QMDP method. To the best knowledge of the authors, EQMDP is currently the most efficient method applicable to large POMDP domains.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A POMDP Framework to Find Optimal Inspection and Maintenance Policies via Availability and Profit Maximization for Manufacturing Systems

Maintenance can be the factor of either increasing or decreasing system's availability, so it is valuable work to evaluate a maintenance policy from cost and availability point of view, simultaneously and according to decision maker's priorities. This study proposes a Partially Observable Markov Decision Process (POMDP) framework for a partially observable and stochastically deteriorating syste...

متن کامل

Exploration in POMDPs

In recent work, Bayesian methods for exploration in Markov decision processes (MDPs) and for solving known partially-observable Markov decision processes (POMDPs) have been proposed. In this paper we review the similarities and differences between those two domains and propose methods to deal with them simultaneously. This enables us to attack the Bayes-optimal reinforcement learning problem in...

متن کامل

Learning for Multiagent Decentralized Control in Large Partially Observable Stochastic Environments

This paper presents a probabilistic framework for learning decentralized control policies for cooperative multiagent systems operating in a large partially observable stochastic environment based on batch data (trajectories). In decentralized domains, because of communication limitations, the agents cannot share their entire belief states, so execution must proceed based on local information. D...

متن کامل

The Infinite Regionalized Policy Representation-0.1cm

We introduce the infinite regionalized policy presentation (iRPR), as a nonparametric policy for reinforcement learning in partially observable Markov decision processes (POMDPs). The iRPR assumes an unbounded set of decision states a priori, and infers the number of states to represent the policy given the experiences. We propose algorithms for learning the number of decision states while main...

متن کامل

An Improved Policy Iteration Algorithm for Partially Observable MDPs

A new policy iteration algorithm for partially observable Markov decision processes is presented that is simpler and more eecient than an earlier policy iteration algorithm of Sondik (1971,1978). The key simpliication is representation of a policy as a nite-state controller. This representation makes policy evaluation straightforward. The pa-per's contribution is to show that the dynamic-progra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Intelligent Automation & Soft Computing

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2004